Anaphoric Annotation in the ARRAU Corpus
نویسندگان
چکیده
Arrau is a new corpus annotated for anaphoric relations, with information about agreement and explicit representation of multiple antecedents for ambiguous anaphoric expressions and discourse antecedents for expressions which refer to abstract entities such as events, actions and plans. The corpus contains texts from different genres: task-oriented dialogues from the Trains-91 and Trains-93 corpus, narratives from the English Pear Stories corpus, newspaper articles from the Wall Street Journal portion of the Penn Treebank, and mixed text from the Gnome corpus.
منابع مشابه
ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions
This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phen...
متن کاملInfluence of Text Type and Text Length on Anaphoric Annotation
We report the results of a study that investigates the agreement of anaphoric annotations. The study focuses on the influence of the factors text length and text type on a corpus of scientific articles and newspaper texts. In order to measure inter-annotator agreement we compare existing approaches and we propose to measure each step of the annotation process separately instead of measuring the...
متن کاملAnnotation of Anaphoric Expressions in an Aligned Bilingual Corpus
This paper discusses a French-English corpus annotated and aligned at anaphoric level. It also presents an annotation scheme based on the study of a detailed corpus featuring different types of correspondences and mismatches. The scheme which is adapted from EAGLES recommendations, supports the alignment at anaphoric level and caters for the different kinds of mismatches.
متن کاملArabic anaphora resolution: corpora annotation with coreferential links
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...
متن کاملAnaphoric Annotation of Wikipedia and Blogs in the Live Memories Corpus
The Live Memories corpus is an Italian corpus annotated for anaphoric relations. The corpus includes manual annotated information about morphosyntactic agreement, anaphoricity, and semantic class of the NPs. For the annotation of the anaphoric links the corpus takes into account specific phenomena of the Italian language like incorporated clitics and phonetically non realized pronouns. The Live...
متن کامل